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Abstract 

In two languages, Linear Algebra and Lie Algebra, we describe the results of 
Kostant and Wallach on the fibre of matrices with prescribed eigenvalues of all 
leading principal submatrices. In addition, we present a brief introduction to ba- 
sic notions in Algebraic Geometry, Integrable Systems, and Lie Algebra aimed at 
specialists in Linear Algebra. 



Introduction 

The well-known meeting-place of linear algebra and Lie algebra is the classical matrix 
groups. This paper is not a survey of that theory, but is about a more specific confluence, 
and we consider only the general linear ("type a") case among the classical groups. 

For want of a standard name we designate as the Ritz values of a matrix the eigenval- 
ues of all its leading principal submatrices. In |KWlj and |KW2j . Kostant and Wallach 
studied the structure of the set ("fibre") of matrices with given Ritz values. They con- 
structed a certain commutative Lie group, which acts on the space of matrices and whose 
action preserves Ritz values; the second paper culminates by showing that this naturally 
leads to a particularly nice set of coordinates on the space of those matrices whose Ritz 
values satisfy some disjointness condition. 

Inspired by this work, Parlett and Strang [PSj studied such problems using bona fide 
matrix theory and linear algebra. Later one of us [BNP] showed quite explicitly how to 
parameterize the space of matrices with given generic Ritz values, without invoking any 
Lie theory or algebraic geometry. However, hiding away the symmetry of the problem 
does have some drawbacks: while the coordinates are easy to define, it is not clear what 
they mean, or that they satisfy any natural properties. Thus the extra structure of Lie 
theory can give depth to the matrix theory. 
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Kostant and Wallach's group, and their parameterization of generic fibres, do appear 
as such in the matrix-theoretic approach, but the properties (such as their version of the 
Gelfand-Kirillov theorem) that mark them as nice can only be seen by considering the 
global geometry of the space of matrices, rather than a single fibre at a time. 

The purpose of the present work is twofold. The primary purpose is expository. We 
need to introduce enough of the language of Lie theory to be able to state and apply some 
of Kostant and Wallach's results. We hope the reader will be convinced that geometrical 
intuition, using the machinery of Lie/algebraic group theory and algebraic geometry, 
while on the surface very abstract, can not only suggest the right way to think about a 
problem in linear algebra, but, in fact, tell us how to do the actual computation. 

We show how to recover BNP's construction using the language and results of Kostant 
and Wallach, and prove that the two sets of coordinates are, in fact, identical. 

1 Matrix picture 

1.1 Notation and basic facts 

In contrast to most papers on matrix theory, certain matrices will be denoted by lower- 
case Roman letters, such as x, for compatibility with the notation used in [KWll [KW2| . 
However, sometimes lower-case Roman letters, such as b and c, denote vectors, and 
sometimes we will use them to denote scalars, such as t, etc.; the type of object will 
always be unambiguous. For a square matrix x, the leading principal submatrix of 
order m will be denoted by Xm', in Matlab notation Xm = x{l : m, 1 : m). 

Let E{x) denote the multiset of eigenvalues of x. The object of study is C"^" for a 
fixed natural number n, but, since it will be endowed with extra (e.g.. Lie) structure, we 
use the standard notation M{n). What is not standard is ^(x), x G M{n). 

Definition. The set of Ritz values of x G M{n) is the tuple ^(x) = [E{xi), E{x2), • • • , E{xn)) ■ 

This name was chosen because, in numerical linear algebra, when x is Hermitian 
E{xm), for m < n, is regarded as an approximation to a subset of E{xn), and, indepen- 
dently, Rayleigh and Ritz showed that the former are optimal approximations (in various 
senses) from the subspace spanned by the first m columns of the identity matrix. See (Pl 
Chapter 11]. 

For Hermitian matrices there are interlacing conditions connecting E{xm-i) and 
E{xm)- However, for M(n), there are no constraints on ^(x); any set of ("J"^) com- 
plex numbers is M{y) for some y G M(n). Moreover, sharing Ritz values determines an 
equivalence relation on M(n), and we have 

M(n) = ]jM«(n), (1.1.1) 

where M^(n) = { x € M{n) \ M{x) = =^ }. (The coproduct symbol JJ here just means 
the set-theoretic disjoint union.) In geometric theory the equivalence class M^{n) is 
called a fibre, ^ and we will use this terminology. We have the following 

^Specifically, a fibre of the map x i— > ,'%{x) which assigns to each matrix its Ritz values. 
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Problem 1.1.2. Given describe the fibre Mcg(n). 

The first observation is that ^ determines the diagonal entries uniquely: 

Thus all members of M^{n) share the same diagonal. 

Elementary conjugations. For matrix theorists it seems a challenge to generate the fibre 
for a given In the generic case all elements are similar to each other, and yet the 
diagonal is fixed. What mappings x i— >■ gxg~^ , with g G GL(n), preserve ^? 

A little reflection suggests two types which we will call elementary conjugations: 

(i) transposition (not conjugate transpose), x i-^ x""", 

(ii) diagonal similarity, x i— > dxd~^, d G GL(n) diagonal. 

These two are far too weak to generate the fibre; the effect of elementary conjugations 
upon the dual coordinates s which parameterize the fibre will be shown later. 

To state the first significant result of Kostant and Wallach, we need the notion of a 
Hessenberg matrix. 

Definition. A matrix H € M{n) is upper Hessenberg if Hij = for i > j + 1- is said 
to be unreduced if -ffj+i, l<i<n — 1. if is said to be unit upper Hessenberg if 

Hi^i^i = 1, 1 < z < n - 1. 

The result of Kostant and Wallach is that upper Hessenberg matrices serve as a 
natural set of representatives of each M^{n). Formally, 

Theorem 1.1.3 f |KWl| . Theorem 0.1 and Remark 0.3). For any M^{n) contains 
exactly one unit upper Hessenberg matrix. 

Remark. A matrix-oriented proof was given in [PS| . 

Another result of Kostant and Wallach, also established by elementary means in [PS], 
is that when M is generic (defined below in Definition ll.2.ip then the strictly lower 
triangular part of x € M^{n) determines uniquely the strictly upper triangular part, 
and vice versa. Thus it is tempting to think of the strictly lower part as a suitable set 
of coordinates for x that is dual, or complementary, to The parameter count (2) 
is exactly right. For reasons that will be made clear below, this temptation must be 
resisted. 

The major result of |KW2| was to find a "nice" set of coordinates to specify the 
members of M^(n) for generic ^. They are given by tuples s = {s^^\ . . . , s^"~^^), with 
gi"^) g but no entry can vanish, so we invoke C^, the multiplicative group C \ {0}, 
and have s^"^^ G (C^)™. Thus Kostant and Wallach present a coordinate system {^,s) 
for the generic elements of M{n) that is not familiar to matrix theorists. The goal of this 
paper is to show the geometric meaning of those coordinates. In some sense this is an 
instance of the Darboux coordinates {q,p) in the Hamilton-Jacobi theory of mechanics.^ 

^The coordinates s) will be action-angle coordinates arising from an integrable system; see Sec- 
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1.2 Eigenvalue disjointness 



The simplest version of the theory, on which we will focus and for reasons we will explain, 
occurs in the "generic" case. Consider the following conditions on an n x n matrix x: 

(Glm) The elements of E{xm) are distinct. 

{G2m) E{x^) n E{x^+i) = 0. 

The significance of these conditions will be discussed in Section II. 51 



Definition 1.2.1. We call (Glm) and (G2m) the "eigenvalue disjointness conditions." If 



both (Glm) , 1 < "m, < n, and {G2m) , 1 < m < n — 1, hold for x G M{n), we will call x 



generic. 
Definition. 



Mn{n) = { X E M{n) \ x is generic } . 



The complement of Mn(n) in M(n) breaks up into pieces specified by how badly con- 



ditions (Glm) and (G2m) are violated. Each such violation translates into the vanishing 
of some polynomial in the entries of x, e.g., (G2i) is false exactly when xi2a;2i = 0, and 
(GI2) is false exactly when (xn — 2:22)^ + 4xi2X2i = 0. The set Mn(n) is, therefore, a 
(nonempty, therefore dense) Zariski-open subset of M(n).^ For this reason we sometimes 
say that a matrix x G Mfi{n) has generic Ritz values. Often the term "generic" refers to 
any dense open subset. For example, one would say the condition that a matrix be diag- 
onalizable is a generic condition. It is somewhat confusing to refer to "generic" matrices, 

2 

because there are many dense open subsets. In fact, since M{n) = C" is an algebraic 
variety, in the Zariski topology any nonempty open subset is dense. In this paper, for 
the sake of brevity, "generic" will refer to the specific eigenvalue disjointness conditions 
just described. 



What is wrong with eigenvalues? A given set ^ of Ritz values may be designated in var- 
ious ways by a matrix theorist. We could write down the eigenvalues E{xi), E{x2), ■ ■ ■ , 
E{xn) in some specific order for each m. We could write down the set {Pi, P2> • • • , P«} of 
monic characteristic polynomials of xi, X2, • • • t Xn- We could write down the coefficients 
of each Pm other than the dominant one. The descriptions are equivalent. Life is not so 
carefree for Lie theorists because there is no natural global meaning to "the ith eigenvalue 
of X." For example, let 



X = x{t) = 




As t goes from to 1, x(t) describes a smooth family of generic matrices. We may 
diagonalize x(0) = (10) via 




tionO 

^If the appellation "Zariski" is intimidating, do not fret. Our exposition eschews further mention of it. 
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Extending this to the family, we have 




However, A(0) 7^ A(l) despite the fact that x(0) = x(l). Hence there is no consistent 
smooth global way to order the eigenvalues of a matrix. But Lie algebra is committed 
to smooth maps. 

In fact, Kostant and Wallach do give a global definition of "ith eigenvalue" by means of 
a "covering" MQ{n, e) of MQ(n). This extra technicality is not needed for a description of 
the fibres (it is only introduced in the second part [KW2| of their paper which establishes 
a "Gelfand-Kirillov theorem" for M(n)). We avoid this complication by considering a 
single fibre Mag(n) with some ordering of each E{xm) already given. 



1.3 The complementary coordinates 

Now we describe the complementary coordinates s = (si, . . . , s^n-j) for a generic fibre. 

Consider a matrix x with generic Ritz values. Write ^(x) = (£'(xi), . . . ,E{xn)) with a 
fixed ordering for each E{xm)- We will denote by 

A^ = diag(/ii'"\...,/x(;r^) 
the diagonal matrix with the elements of E{xm) placed along the diagonal. For 1 < m < 



n — 1, (Glm) implies that Xm is similar to A^- Hence there exists a matrix € GL(m) 



such that 

•^m — 9m^m9m ' (1.3.1) 

and it becomes unique if the last row of Qm consists of ones. (Note that the last entry of an 
eigenvector ofxm must not vanish, since {XIm—Xm){o) = implies {XIm-i—Xm-i)u = 0, 
but Xm-i and Xm are assumed to have no eigenvalues in common.) Then 

(we consistently write rows as transposed columns), and our dual coordinates appear 
in (|1.3.2|) as the entries of 5^. We call the pair (6^, Cm) the "arrow coordinates" of Xm+i- 

Claim 1.3.3. 6^ is identical with Kostant and Wallach's coordinates s^"^\ 

A proof is given in Section 13.11 

Notation, diag(f), v S C™, denotes the diagonal matrix diag(fi, V2, ■ ■ ■ , Vm) G M{m). 



It is a consequence of the generic conditions (Glm) and (G2m) that diag(6m) diag(cm) is 
invertible: 

Theorem 1.3.4. diag(6m) diag(cm) = -Pm+i(Am)(P4(^m))~^ 
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Proof. By (fLOIl . 

-Pm+l(A) = det(A/m+i - Xm+l) 
= det 



hZ. X- S. 



m+1 



(1.3.5) 



where 



= P^(A) [(A - 5^+i) - bliXI^ - Am)-^c^] 
= Pm{X){X - ^m+i) - bl diag (Pi^) (A), . . . , Pir> (A)) Cm, 

Evaluating p.3.5p at A = fJ.^^\ fJ-^^\ • • • , fJ-rn^ gives 

Pm+l(Am) = - diag{bm)Pmi^m) diag(cm), 
and diagonal matrices commute. □ 

Given all b'^, we can reconstruct a unique x G Mp^{n). First, a useful lemma: 
Lemma 1.3.6. Consider a (down) arrow matrix 

j)GM(m + l), 

where D = diag((ii) and A is similar to A = diag(Aj), and di and Aj are all distinct. It 
is convenient to define the (rectangular and skew) matrix Cauchy(P', A), 

Cauchy(L>, A)ij = {d.; - Xj)"^ 

(usually Cauchy matrices are defined as {di + Xj)^^ with same-sized parameter sets). Also, 
define ones to be an array all of whose entries are I's; the shape of ones is dictated by 
the context. Then the spectral factorization of A is given by 

A = Z-^kZ, 

where 



diag(p) Cauchy(L), A) 
ones 



, Z = ir' [Cauchy(A, D) diag(g), ones 
n = — Cauchy(A, D) diag(g) diag(p) Cauchy(D, A) + ones. 



n,, = if i / J, n,, = 1 + y = 1 -Y n J'^. - Am) 



(thus n is independent of p and q) 
i.e., we have found the eigenvectors of A. 
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Proof. The distinctness of the di and Xj imphes that the last entry of an eigenvalue of A 
must be non-zero. If ^ is an eigenvalue of A, the equations 



0, 



1) {isI-A) = 



imply 



u = {fj, — D) 

V = {fi- oy^q. 

Using (|1.3.7p for each eigenvalue Aj of A, we find 
column eigenvectors 



(1.3.7) 



row eigenvectors 



diag(p) Cauchy(Z), A) 
ones 



UZ = [Cauchy(A,L>)diag(g) ones . (L.6.C 



The product of the matrices in (ll.S.Sh is not I, but it must be diagonal since the Xj are 
simple eigenvalues."* 

n = IIZZ~^ = [row eigenvectors] [column eigenvectors] 

diag(^») Cauchy(D, A) 



[Cauchy(A, D) diag(g) ones] 



ones 



By Theorem 11.3.41 



— Cauchy(A, D) diag(g) diag(p) Cauchy(D, A) + ones . 
QiPi = Y\.{di - Xjn) / Y{{di - dk). 



□ 



Now we are ready to specify the dual coordinates. Lemma [1.3.61 in our case gives 



Substituting this in (11.3.2|) gives 



diag(cm) Cauchy(Am, Am+i)\ ^ /- diag(cm) Cauchy(Am, A^+i^ 
ones / '"^^ I ones 



-1 



1+1 — fi'm+l^m+lfi'm+l 



with 



9m+l 



g-m 0\ /-diag(cm) Cauchy(Am, A^+i 



ly V ones 

Using Theorem 11.3.41 diag(cm) = —Pm+i{^m)Pm{^my^ 'iiag{bm)~^ , and so we get the 
^-recurrence 

91 = (1), 

diag(cm) Cauchy(Am, A^+i)] 
ones 

gmPm+ii^m)P!ni^m)~'^diag{bm)~^ Cauchy(A.m, Am+l] 

ones 



9m+l 



(1.3.9) 
(1.3.10) 



""if Au = \iU and v'^A = \jV^ for some u, 5 / with \i ^ \j, then v^u — 0. 
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and we find x = gn^ndn ■ 

The recurrence (|1.3.9p shows exphcitly how and A^+i, i.e., E{xm) and E{xm+i), 
determine the eigenvectors. Clearly (11.3.9(1 is simpler than (|1.3.10|) . but we give preference 
to bm over Cm to align our results with those of Kostant and Wallach. 

We can put the parameters bm together and define an invertible diagonal (2) x (2) 
matrix 

6 = diag(6i,...,6„_i) €£'(2), 

where D{^^ is the group of invertible diagonal (2) x (2) matrices. The recurrence (|1.3.10l) 
constructs a matrix = g^^^ € GL(n) given any 6 € /^(g), and we have proved that 

Mo2{n) 3 X ^ {gne GL(n) | gnKgn" ^ and e^gn = ones } ^ 6 

are bijections. We can use these bijections to define an action of on Mp^{n) by 

X M«(n) ^ M«(n) 
b'.x = gi'^'^AM'"'^)-' if X = 5WA„(5(^))~\ 

This is a description of Kostant and Wallach's group action in [KW21 Theorem 5.9], 
which is revisited in Section [3] (see Remark I3.16P from a more natural point of view. 

The conclusion (remember that we are still in the generic case) is that each choice 
of nonzero s = (bj , . . . , will determine a member of M^(n), and different s's 

yield different matrices in M^(n). For each fixed s as ranges over all generic ^'s we 
get a transverse slice of Mfi{n). It seems a blemish that our dual coordinates bm had to 
be non-zero; this will be removed naturally in the Lie format. The canonical coordinates 
will be angle coordinates g, while the non-zero coordinates will essentially appear as e'^. 
The non-zero coordinates do have the advantage of being single-valued on the fibres, 
though. 

1.4 Complementary coordinates for the elementary conjugations. 

For completeness's sake, we describe the dual coordinates that correspond to the elemen- 
tary conjugations for generic ^. 

Transposition. Let b"^ = {bj , 6j, . . . , be the dual coordinates of x € M,ag{n) C 

Mf2(n). To find the dual coordinates of ^ it is necessary to invoke the special diagonal 
matrices 

= --Pm+l(Am) (^'m(Am)) , 1 < m < n - 1, 

relating bm to Cm which are given in Theorem 11.3.41 and also the diagonal matrices 11^ 
(appearing as 11 in Lemma ri.3.611 which relate row and column eigenvectors. The diagonal 
matrices Sm, and depend only on not on bm- 

Lemma 1.4.1. Let the dual coordinates of x"^ be b^ = {bj, bj, . . . , Then, for 1 < 

m < n — 1, 

diag(6m) = Ilmdiag{bm)~^^m- 
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Proof. Xm = gm^mgm implies 

~ (5m ) ^m9m ~ \9m ) ^m-^TrJ^m 9m) 

and the last row of (g"^)'^!!^ is ones by p.3.8p . Transposition requires that the last 
row of (11"-'^ 5^ © l)x^j^i{{g^y^Tlm © 1) be equal to the last column of 

(J^rn9m ® ^)Xm+l{9rrJ^m © 1) = ( ATrT-l fT 

therefore hm = ^mCm, and the Lemma follows using diag(6m) diag(cm) = ^m- D 

Diagonal similarity. Let x = dxd^^ . As usual, dm denotes the leading principal sub- 
matrix of d, and so we denote the (m,m)th entry by d{m). Let = (pi, 6j, . . . , 
denote the dual coordinates of dxd~^ = Qn^ngn^ ■ We note immediately that all g^s 
in (|1.3.ip are normalized to have ones in the last row, so we cannot have gm = dm9m- To 
rectify this we define dm = dm/d{m). In particular, 

dm^mdffi — dfnXmdfyi ■ 

Lemma 1.4.2. For generic 

bm = bmd{m + 1)/ d{m), 1 < m < n — 1. 

Proof. We have 

— dmXmd^ — dmgm^mgm dm ) 

SO we calculate 

^9m dm 

© l)Xm+l{dmgm © 1) = (^m C^m © l)dm+lXm+ld^+i{dm9m © 1) = 



d[m) 



9m dm^ 0\ (dm 



1/ I 



d{m)I„ 



d{m + 1)J \ 1 



d{m+l) I Xm+1 
d{m) J 

■'^m+1 



d{m) 



g^^^d-;^ 0\ (dm 



d{m)Irr 



1/ I 







d(m+l) 
d{m) , 



d{m+l) J \ 1 



9m 



-1 



d{m)I„ 







d{m + l)J \bl 6m+i 



Am Cm \ f d{m) 







d{m + iy 



d{m) 



^ d(m+l)' 



d{m+l) , T 
, d{m) 



^m+l 



SO that 



and 



Cm — Cm \ 



,,{d{m)/d{m + l)) 
bm = bm{d{m + l)/d{m)), 



as claimed. 



□ 
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One may ask what exactly is preserved by elementary conjugations, and the answer is 
not Ritz values but rather all principal minor determinants. More precisely, a result due 
to Loewy |Lol Theorem 1] is that under some non-degeneracy conditions two matrices 
have equal corresponding principal minors if and only if they are equivalent under an 
elementary conjugation. 

There are 2" — 1 non-trivial principal minors, but only in? — n+1 of them are indepen- 
dent, so the minors satisfy many relations, in contrast to what happens for Ritz values. 
While a full analysis of the problem is outside the scope of this article, we emphasize that 
this is again a prime example of the applicability of Lie- and representation-theoretic and 
geometric methods to a problem in matrix theory by exploiting its symmetry. 



1.5 Genericity conditions 

Consider the following problem: let Bm £ M{m) be any matrix, and suppose we wish to 
find 

5^+1= (^^^ 'pj GM(m + l) 

such that 

det(A/™+i - = W {\-Xi) (1.5.1) 

l<i<m+l 

for given Ai, Am-i-i £ C (Recall that 5 = ti{Bm+i) — ^(5^) is fixed.) Equa- 
tion (11.5.1(1 is an algebraic constraint on the 2m coordinates of b and c. The 2m coordi- 
nates must satisfy m polynomial equations, therefore under sufficiently general conditions 
we expect an m-dimensional set of solutions, while under degenerate conditions the dimen- 
sion may increase, or there may be no solutions at all. Let us give several interpretations 
of this problem, and examine the role of each genericity condition. This will introduce 
useful notions from linear systems theory; this is another field not generally known to 
matrix theorists. These sections show why the strictly lower triangular part of x is not 
a viable choice of coordinates for x complementary to Si. 

1.6 Observability and controllability 

Consider the system of ordinary differential equations given by 

X{t) = BmXit)+Cu{t), 

(1.6.1) 

y{t) =b^x(t) + 6u{t). 

This represents a continuous time-invariant linear system (SISO)^ with state x{t) S C", 
control u{t) G C, and output y(t) G C. Possibly abusing language, 

is observable when rank(6, B'^b, . . .) = m. 

is controllable when rank(c, BmC, . . .) = m. 
°SISO = Single Input, Single Output 



(Bm 
b^ 

The pair ( Bm c 
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The algebraic significance of observability and controllability will become clearer if 
we introduce the following terminology. Let be the space of column vectors. A 
vector V € C" is called a cyclic vector for a matrix Bm if for any w £ there exists a 
polynomial f{x) € C[x] such that f{Bm)v = w. In this language, the system (^Bm c 

'B, 



is controllable if and only if c is a cyclic vector for Bm , and j observable if and 

only if 6 is a cyclic vector for B'^. Matrix theorists would say that c is a cyclic vector 
for Bm if the minimal polynomial of c for Bm has (maximal) degree m: f{Bm)c = 
for a nonzero polynomial / only if deg f > m. The centralizer of an element x of a Lie 
algebra g is the set of elements {y £ Q \ xy = yx} that commute with x. (Lie algebras 
are briefly discussed elsewhere. Here one may read q = M(n) and ignore the appellation. 
Those familiar with functional analysis or operator algebras will also recognize this as 
the definition of the commutant of {x} C M{n) = End(C'^).) 

Theorem 1.6.2. A matrix Bm has a cyclic vector if and only if the centralizer of Bm 
coincides with the algebra { f{Bm) | / G C[x] } of polynomials in Bm- 

This property is known to Lie theorists as regularity (not to be confused with the 
property of being invertible — a matrix with only zero eigenvalues may well be regular 
in our sense). For matrix theorists, it is equivalent to being non-derogatory, i.e., the 
minimal polynomial equals the characteristic polynomial. Clearly the identity element 
is far from regular; a diagonal matrix is regular if and only if its diagonal entries are 
distinct. 

Let i-'m(A) = det(A/m — Bm) and -Pm+i(A) = det(A/m+i — Bm+i)- Block elimination 
yields 

Pm+l{X)=Pm{X){X-S-b^{XIm-Bmr^c). (1.6.3) 

Assume (jl.S.ip . that is, that Bm+i has the specified eigenvalues. 
Theorem 1.6.4. Given 6, there exists a unique c such that (jl.S.ip is satisfied if and only 
if ^ is observable. Given c, there exists a unique b such that (ll.S.lh is satisfied if 
and only if [Bm c) is controllable. 

Proof. This is a variation on a standard problem in control theory. Re-write (|1.6.3p as 

b {Mm-Bm) c = X-d , , ■ 

By construction (recall that 6 is determined hy 6 = tr:{Bm+i) — tr(i?m)), the right-hand 
side is holomorphic at A = oo, and expanding both sides into power series gives 

oo 

X-'b^Bi-'c=J2>^-'9k 



oo 

^ \-fcATDfe-l„ _ \-k, 
k=l k= 



for some numbers Qk € C. Equating powers of A, b and c must satisfy the (infinite) 
system of equations 

b^B^m^c = gu, fc = l,2,... 
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The condition that there exist a unique solution c is exactly that [b'^ b^Bm ■■■)"'" have 
full column rank, i.e., observability. 

The proof considering controllability is analogous. □ 



Note. Condition (02^) holds exactly when and Pm(A) are relatively prime, in 

which case the right-hand side of 



X-'^b^B^-^C = b^{\I,n - Br, 



X-5 



P. 



m+l 



(A) 



k=l 



is a rational function of degree m. In that case, if b and c constitute a solution, the 
Hankel matrix 



/ b^c b^B„,c 
b^B^c b^B^c 



( b^ 

b^B.^ 
b^Bl 



\ 



■J 



(c Be •••) 



has rank m and so the system must be both observable and controllable. 

Example. If Bm is an unreduced upper Hessenberg matrix, then the row (0, ...,0, 1) 
always yields an observable system. 



Example 1.6.5. Consider the case when Bm is a regular diagonal matrix (thus (Gl 



holds, and (02^) by assumption; we are not assuming (Glm+i))- Then (jl.S.ip has 
solutions if and only if ^ is observable, if and only if [Sm is controllable, if 
and only if all the entries of b and of c are non-zero. (See Theorem 11.3.41 ) 



If Bm is regular and semi-simple, but not diagonal, then it is still true that 



Bm 

is observable if and only if {^Bm cj is controllable: if g^BmQm is diagonal, then this 

happens when b^Qm, resp. g^c, has non-zero entries. (This is relevant to the matrix 
in (fTOIl .) 

A matrix Bm has a cyclic vector (if and only if B^j^ has a cyclic vector), if and only 
if Bm is regular. So if Bm is not regular (in our sense), then the system ^ is never 

observable, nor is [Bm c] ever controllable. 



1.7 Beyond the generic case 

The criterion in Example 11.6.51 that the entries of the row/column be non-zero, may be 



readily generalized to the case when Bm is regular, but (Glm) fails to hold. 
Lemma 1.7.1. Suppose that Bm is in the form 

Bm = diag(Jmi(di), . . . , JmAdt)), 
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where each Jordan block 



f di 

1 di 

1 

V : 



G M{mi) 



I 



and di, . . . , dt are distinct. Then a row (yi, • • • , ym) is observable if and only if each of 
the entries ymi, ymi+m2i • • • > 2/™. is nonzero, namely the last entry in each segment. 

Sketch of proof. The matrix B'^ is block-diagonal; let us write 



(yi,---,ym)''' = y^^^ ©••• ©y 



so that 



1) 



y^*^ = {yk,+i,--- ,yk,+mi)' 



E 

t'<i 



nit' 



We claim that (yi, . . . ,ym)'^ is a cyclic vector for 5^ if and only if y^''^ is a cyclic 
vector for Jmiidi)"^ for each 1 < i < t. One way to see this is to recall that a vector y 
is cyclic for any matrix if and only if B'y = implies B' = 0, for every B' in 
the centralizer of Bm- Because our is regular, its centralizer just consists of block- 
diagonal matrices B' = © • • • © J^^, where each J^. commutes with Jmi{di)^ . This 
verifies our claim. This reduces the problem to calculating cyclic vectors for a Jordan 
block, which is straightforward. □ 

Hence, if some generalized eigenvalue of a regular B^ has multiplicity, the set of 
rows making the system observable, and also the set of columns making the system 
controllable, is isomorphic to (C^)* x C"*"*. (And we recall that neither observability 
nor controllability is possible when Bm is not regular.) 

We conclude with the observation that the geometric structure of the set of solutions 
for h and c satisfying (ll.S.lj) . at least for regular Bm, depends only on how many Ritz 
values coincide, and not on the Ritz values themselves. This explains why transverse slices 
are possible when we restrict M to belong to a subset of Ritz values of fixed combinatorial 
type, such as the generic Ritz values introduced in Section [L2l 



2 Lie theory 

For the reader who is not familiar with Lie theory and geometric terminology, we include 
an overview of the concepts necessary to present the results. Our purpose is not to give 
formal or abstract definitions, although we do so as necessary, but to paint a clear picture 
of the construction and how it relates to several important areas of mathematics. This 
section is independent of the rest of the paper, and may be ignored by the cognoscenti. 
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2.1 Basic geometry: integral curves on a vector field 

This concept is constantly used in Section [U every time a formula contains an expression 
like "exp(g^)." It also paves the way for our discussion of integrable systems. 

Start with a vector field ^ on M . (For concreteness, think of M as a smooth manifold, 
although everything goes through for complex or algebraic varieties, and, in particular, 
when we work with M = M{n) our scalars will be in C.) We have a tangent vector at 
each point of M. Starting at some x € M, we may look for a curve that passes through x 
and is everywhere tangent to the vector field ^: 

0(0) = X, 

d (2-1-1) 

By the theory of ordinary differential equations, there exists a unique solution for all t in 
some open interval containing 0, but there is, in general, no reason to expect a solution 
for all t G M. 

Definition. If (|2.1.ip does have a solution for all x € M and all t G M, one says, variously, 
that the fiow defined by is complete, or that ^ is complete, or that ^ is (globally) 
integrable. 

For obvious reasons, the fiow defined by (|2.1.1|) is called exp(t^), i.e., exp(t^)(2;) = 
(j){t), where (^(i) is defined as above. One can check that solutions to (|2.1.ip satisfy 
exp((s + t)^) = exp(s^) exp(t^). The word "fiow" is meant to suggest that as t changes 
all the points of M are smoothly displaced from their positions, like particles in a fiuid. 

Example. Since the vector field ^ has no singularities, intuitively there should be no 
obstruction to integrating it. If M is compact, then any vector field is complete (since 
there exists some e > such that a solution to (|2.1.ip exists for \t\ < e for any x G M, 
and these patch together). 

Example. As an example of what can happen when M is not compact, let M = M 
with coordinate x, and let ^ = x'^d/dx. Then an integral curve of ^ through any point 
would satisfy {d/dt)(f){t) = 0(t)^, whose solutions blow up. (The previous example shows 
that nothing bad happens as long as the integral curve remains bounded. If 0(0) ^ 0, 
what happens is that we fall off the "end" of M in a finite amount of time. The same 
phenomenon may be seen with ^ = d/dx and M = (0, 1).) 

Example. It may be argued that the last example is misleading: if we consider M C = 
MU {oo}, then 

becomes a perfectly good fiow, with a fixed point at rr = 0. No such trick will enable one 
to integrate x^ d/dx, though. The vector field x^ d/dx has a pole at x = oo, and there is 
no way to embed M as a subset U of some manifold M and have a complete fiow on M 
whose infinitesimal action on the subset U is x^d/dx. 
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2.2 Classical mechanics and Poisson geometry 

The evolution of a mechanical system can be seen as the orbit of a point (initial state) 
under the action of time. In classical mechanics, the evolution of the system in time is 
determined by Hamilton's equations 

dqi dH 
dt ~ dpi 

dp^__dH_ ^^■^■^> 

dt dqi 

(for a particle in W^), where q € is position and p G M" is momentum, and the 
"Hamiltonian" function H is the energy of the particle. 

We must keep track of time. Let / = f{q,p) be a classical observable, which means 
any smooth function M^"' —> M. The function / does not depend on time, but let us 
write f{t) = f{q{t),p{t)) : M^"" M for the observable resulting from picking a point, 
waiting for time t to elapse, and only then measuring the value of /. (In particular, 
f{0)=f = f{q,p).) 

Instead of considering just position or just momentum, we can re-write Hamilton's 
equations as 

l(fit)) = {H,f}{t), (2.2.2) 
The right-hand side is the value at time t of the Poisson bracket 

{HJ] = Y,—lt-—^- (2-2.3) 



dpi dqi dqi dpi 



Physicists have concluded from staring at (I2.1.ip and (|2.2.2p that the Hamiltonian H 
generates time evolution: if the map / i— > {H, /} is thought of as defining a vector 
field = Yli ~ i^^' then Hamilton's equations are satisfied if and only if the 

state of the system follows the flow of £,h'- 

f{t) = foeMtiu). (2.2.4) 

Therefore, time and time-evolution naturally appear as soon as one writes down the 
Hamiltonian. Any function H (thought of as "energy") generates a complementary coor- 
dinate t (thought of as "time"), such that the system evolves in "time" so that energy is 
conserved. This is the essence of Hamiltonian geometry. 

This formalism also goes through for any space M with a Poisson bracket, not just M^". 
The appropriate generalization of (I2.2.3|) is 

Definition. A manifold M is a Poisson manifold if the algebra 0'{M) of functions M ^ M 
has a Poisson structure, i.e., there is a bracket 

{ , }: ^(M) ^(M) ^ ^(M) 

making &[M) into a Lie algebra ({ , } is bilinear, antisymmetric, and satisfies the 
Jacobi identity) and satisfying the Leibniz identity 

{f,gh} = {f,g]h + g{f,h}. 
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Classical mechanics takes place on Poisson manifolds. 
2.3 Integrable systems 

Joseph Liouville concerned himself with characterizing those cases when explicit solutions 
to the equations (|2.2.ip may actually be found. He proved this is the case when there 
are sufficiently many (independent) commuting Hamiltonians; this is known as complete 
integrability in the sense of Liouville. The idea is that for a completely integrable system 
we can (explicitly) find canonical coordinates. 

We have seen in (|2.2.4|) that every function ("Hamiltonian") on a phase space has 
a complementary coordinate associated with it, given by following some fiow. Equa- 
tion [222] says that a function / is constant along the fiow of if and only if {H, /} = 0. 
(In classical mechanics, such functions are called (first) integrals.) So, if /i, . . . , fn are 
functions such that {H,fi} = and {fi, fj} = for all i and j, then the trajectory of 
the system is contained in a level set of (/i, . . . , /„). (Commutativity with respect to 
the Poisson bracket means that the fiow corresponding to each function conserves all the 
other functions.) 

Not every system has sufficiently many independent integrals of motion. ("Indepen- 
dent" means that their differentials are linearly independent (on a dense open subset of 
the phase space).) If there are enough independent first integrals which are simultane- 
ously observable (= commutativity with respect to { , }), then if the associated fiows 
are complete we get a system of coordinates on the entire phase space given by the values 
of each of the functions together with the dual coordinates along the level sets (fibres!) 
given by following the fiows. 

Integrable systems are commonly defined in the case when M is a symplectic manifold, 
rather than in the more general case of a Poisson manifold. (A symplectic manifold is a 
manifold which has a closed non-degenerate 2-form; it is a very special kind of Poisson 
manifold.) Since we will later assert that Kostant and Wallach's Gelfand-Zeitlin algebra 
defines an integrable system^ on M(n), which is not symplectic, we give the more general 
definition. 

First of all, what is the maximum possible number of independent commuting Hamil- 
tonians? (For a symplectic M, this is ^dimM.) Let M be a Poisson manifold. The 
rank of the Poisson structure is defined to be the maximum possible number of linearly 
independent Hamiltonian vector fields at a point, i.e., 

rank{ , } = maxdim( {^f)^ \ f G ^{M) ). 

One can show that any Poisson-commutative algebra of functions has dimension at 
most dimM — ^ rank{ , }. 

Definition 2.3.1. A Poisson manifold M of rank r together with a maximal Poisson- 
commutative algebra A C ^(M) is a (completely) integrable Hamiltonian system if and 

^This system, and its complete integrability, was already known to Thimm [T] and Guillemin- 
Sternberg [GS] in the 1980s. 
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only if 

r 

dim A = dim M . 

2 

Liouville showed [Li] how to construct canonical coordinates for an integrable system 
and solve Hamilton's equations. (A modern treatment may be found in [A]; for the 
generalization to the Poisson case, see, for example, |LMVj .) 

The coordinates dual to the functions are called angle coordinates.^ Denote them 
by ipi] we remark that once the angle coordinates are known, the system (|2.2.ip is equiv- 
alent to 

d . , 

—fiit) = constant, 
at 

which is trivial to integrate. 

Note that the "canonical" coordinates do depend on the choice of Hamiltonians. Also, 
the complementary coordinates are measured from a basepoint, which must be specified 
(note that the level sets may not even be connected). 

Example 2.3.2. An interesting system is a Lax pair 

where A and L are matrices. This differential equation describes an isospectral flow 

L{t) = g{t)L{0)g{t)-' 

{g(t) and A{t) are related via dg/dt = Ag), i.e., the eigenvalues of L are invariant over 
time. It follows that any functional / such that f{gLg~^) = f{L) is conserved; e.g., 
the coefficients of det(A — L(t)), or, alternatively, the functions tr(L™'), are conserved 
quantities. Therefore, writing a system as a Lax pair exposes many integrals of motion (in 
fact, any completely integrable system can be written as a Lax pair, although constructing 
Lax pairs equivalent to integrable systems, and vice versa, is a far-from-trivial subject). 

Not only is this an enormously successful method for actually solving various inte- 
grable systems (including various non-linear partial differential equations, such as KdV^), 
Lax pairs bring in Lie theory and geometry in a natural way. While no discussion of in- 
tegrable systems can be complete without mentioning Lax pairs, the subject is too great 
to attempt a thorough treatment here; see |BBTj for an overview. For an application of 
Lax pairs to the topic of this paper, see |BPj . 



'^Beware that the corresponding action coordinates are not usually the same as the particular functions 
used to specify an integrable system. In the case of generic Ritz values, the Ritz values themselves will 
be action coordinates (cf. |KW2| Theorem 5.23]), but this point is not needed in this paper and we will 
not pursue it. 

^Vr = GvVz + Vz 



vzzz 
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We adduce that Kostant and Wallach's theory should be thought of in this frame- 
work. There the phase space will be the space of all matrices, and the Hamiltonians — 
the conserved quantities — will be (certain symmetric functions of) the Ritz values; the 
fibres M^(n) will be their level sets. Taking all of the Ritz values gives a maximal 
Poisson-commutative algebra of observables, and their number is exactly enough to make 
the system completely integrable. 



3 Kostant- Wallach theory 

The algebra of matrices M(n) has a Poisson structure: let be the linear functional 
defined so that aij{x) = Xij, and let Eij be the matrix with a 1 in the (i,j)th position 
and elsewhere (so aij{E]^i) = Sij^kl in terms of Kronecker's 5). Then 

[Eij, Eki] = SjkEii - 5iiEkj 
which specifies the Poisson structure as 



{aij, Oki} = djkau - 5uakj. 

This extends naturally to define the Poisson bracket of any two polynomial, or even 
holomorphic, functions M(n) — > C, because all such may be written in terms of the Uij. 
The Leibniz rule yields 

ij, kl ■> 

Keeping this Poisson structure in mind, motivation for Kostant and Wallach's theory 
may be found in the theory of completely integrable Hamiltonian systems. 

Kostant and Wallach do not use any general theory in their original paper; the main 
actors there are the familiar algebra M(n) of all n x n matrices, and the Lie group GL(n) 
of invertible n x n matrices, which acts on M(n) via the adjoint representation 

Ad{g)x := gxg-\ (3.1) 

Recall that we are interested in quantities conserved under the action of some group, and 
that we should look in advance for functions M(n) — ^ C that Poisson-commute. Even if 
we did not know about Ritz values, we might be led to consider them as follows. 

If we were interested in studying nxn matrices up to similarity, we would be studying 
adjoint orbits, in other words, equivalence classes of matrices under similarity. A classical 
problem is to find numerical invariants of these adjoint orbits, namely all polynomial 
functions /: M(n) C (e.g., tr, det) such that f{gxg~^) = f{x) for all x G M{n) and 
g € GL(n). The set of such functions may be denoted by 

Pol(M(n))^^^"^ = { / e Pol (M(n)) | / is GL(n)-invariant } . 

The solution to this classical problem is that any such function is a symmetric polynomial 
in the roots of the characteristic polynomial, therefore is equal to a polynomial in tr(x'^). 
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k = l,...,n. Since these functions are constant on adjoint orbits, not only do they 
Poisson-commute, but the corresponding vector fields are identically zero. However, using 
this observation, it follows by induction downwards on m that [ti{xmi)''^ ,tT{xm2)^^} = 
for any mi, ki, 1712, k2 (cf. [KWll Proposition 2.1]). 

Example. Let us write {tr(x2), tr(j;3)^| out in coordinates. We can compute everything 
in terms of the linear functionals aij : 

{tr(x2), tr(x3)^} = {ail + "22, "ii + "22 + «33 + 2ai2a2i + 2013031 + 2023032}. 
If we use the Leibniz rule repeatedly and expand, we find that 

{oii + 022, "ii + 2ai202i H } = {au,ali} + 2{oii, 012021} H = 

= 2oii{oii,oii} + 2{oii, oi2}o2i + 2012(011,021} H = 

= + 2012Q21 - 2012Q21 H 

then we see that all terms cancel. 

So all of the functions Poisson-commute, but the functions tv{xm)^ for m < n are not 
Ad-invariant and their associated vector fields on M{n) are non-zero. Someone looking 
for a Poisson-commutative algebra of functions on M(n) might perhaps stumble upon 
Ritz values as a way to greatly enlarge Pol(M(n))^^^"'\^'' 

In any case, Kostant and Wallach begin by considering the algebra 

J(n) =Pol(M(l))^^^^^Pol(M(2))^^^^^ •••Pol(M(n))^^^"^ C Pol(M(n)), (3.2) 

which is generated by the functions tr [{xm)^) for m = 1, . . . , n}^ To make this notation 
clear, let us enumerate 

/i = tr(xi), /2 = tr(2;2), /s = tr(x2)^, etc. (3.3) 

Then a typical element of J{n) looks like 

which maps M(n) — > C. Since the turn out to be algebraically independent, and they 
commute, J(n) is isomorphic to a polynomial algebra C[/i, . . . , f(n+i^] in the variables /j. 

^As remarked in ExamDle l2.3.2l any isospectral flow naturally conserves these quantities. To see that 
any Ad-invariant polynomial is a symmetric function of the roots of the characteristic polynomial, a 
quick way is to observe that any such polynomial is determined by its value on diagonal matrices. 

^"Kostant and Wallach explicitly mention in the abstract of (KWlj that they regard the algebra J{n) 
generated by all the tr{xm)'' as a classical analogue of the Gelfand-Zeitlin algebra, which is a commutative 
(in the usual sense) subalgebra of the universal enveloping algebra of M(n). 

Taking linear combinations of products of functions on the submatrices may be new to many read- 
ers. A point of notation: we are implicitly using the truncation map x ^ Xm to embed each Pol (Af(m)) 
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Proposition 3.4 ( [KWlj . Theorem 0.4). J{n) is a (maximal) commutative subalgebra 
of Pol (M(n)) . Furthermore, for any / G J{n), the Hamiltonian vector field ^/ associated 
to / is globally integrable on M(n), defining an action of C on M{n)}'^ 

This richer structure (considering the set of all Ritz values of x simultaneously) now 
stands a chance of defining an integrable system on M{n). 

Proposition 3.5. Let Ox C M{n) be an adjoint orbit^^ consisting of regular^"^ elements. 
Then the Hamiltonians /», 1 < ^ < (2), form a completely integrable system on Ox- 
Moreover, the algebra J{n) forms an integrable system on M{n) in the sense of Defini- 
tion EXH 

This has a chance of being true because we have exhibited the right number of com- 
muting Hamiltonians: dimOx = ra^ — n for regular x, and the number of Hamiltonians is 
1-I-2-I-- • ■+{n—l) = n(n— 1)/2. Similarly, the Poisson rank of M(n) is n^— n, and the total 

number of commuting Hamiltonians including also . . . , /^n+i^ G Pol(M(n))*^^''"^ 

is ("^^) = Tn? — {"n? — n)/2. (Things do not work out this nicely if, for instance, we 
replace M(n) by the Lie algebra of symplectic matrices; see [0, Remark 1.7.2].) 

Complete integrability requires that the Hamiltonians Poisson-commute and that 
they be independent. We have already mentioned the first condition; the question of 
independence leads to the notion of strong regularity. Kostant and Wallach give many 
equivalent characterizations of strong regularity. The ones relevant now are given by 
their Theorem 2.7 and the definition immediately preceding it: 

Definition/Theorem. A matrix x is strongly regular if and only if the differentials {dfi)x, 
1 < i < ("2^^)) ^'^^ linearly independent, if and only if the tangent vectors (^/Jx, 1 < 
^ < (2)1 are linearly independent. 

(The missing vector fields C//„n corresponding to the elements of (Pol M(n))^^*'"'\ 

are zero, as explained before. Functions whose associated Hamiltonian vector fields are 
zero are called Casimir functions. One has to take them into account when dealing 
with integrable systems on a Poisson manifold, rather than the more familiar case of a 
symplectic manifold.) 

In other words, a "strongly regular" matrix is a regular point of the function x 1— > 
(/i(x), . . . , /^n+i-j (x)) . It does not mean what is sometimes called complete (or strong) 
regularity, that each Xm be invertible. 



into Pol (M(n)), so each factor in l|3.2p is a subalgebra of Pol(M(n)). This construction of a commu- 
tative algebra starting from a system of inclusions of subalgebras is associated with Gelfand-Zeitlin 
(a.k.a. Gelfand-Tsetlin), and is not meant to be intuitively obvious. 
^^Moreover, this action is given by a nice, explicit formula; see l|3.7p . 

^''We keep coming back to adjoint orbits. Their importance is that M{n) is Poisson but not symplectic; 
the adjoint orbits are the symplectic leaves. 

^^An element x € M(n) is regular if and only if dimOx = — n. li x is not regular then dimOa; is 
strictly lower, and the Hamiltonians fi have no chance of being independent there. Hence the hypothesis 
here that x be regular. 
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Kostant and Wallach prove |KWll Theorem 2.3] that unit upper Hessenberg matrices 
are strongly regular, and, therefore, that the set of strongly regular matrices is a dense 
open subset of M{n). This proves independence of the "Hamiltonians" /«. 

We are thus in the situation described before: any matrix will be described by its 
Ritz values, which specify a fibre M^(n), and the complementary coordinates associated 
to the Ritz values (measured from a point on the fibre, which must be specified). The 
complementary coordinates will be angle coordinates for the integrable system. 

The Gelfand-Zeitlin group A, central to their theory, is just the group obtained by 
integrating the vector fields corresponding to the functions tv{xm)'' G •^("•) for 1 < m < 
n — 1 and 1 < k < m. This Lie group turns out not to be so mysterious; it is a a 
commutative group, isomorphic to C^a). If (jS.Sj) are our chosen generators of J(n), then 
a typical element of A can be written 

a = exp(gi^/J exp{q2^f^) ■ ■ ■ exp(g(^n^^/^„^ ) (3.6) 

where € C. The reader unfamiliar with Lie groups may regard this expression as a 
rather formal way of keeping track of the coordinates qi] the group multiplication is given 
by 

The significance of the exponential map in Lie theory is that it relates maps on Lie 
groups with maps on Lie algebras. In this specific case, given any matrix x G M{n) and 
any a G A, the matrix a ■ x G M{n) is defined, and it is defined in terms of the vector 
fields ^f. (which span a Lie algebra). This action is computed in |KW1| . and we will use 
the results. In particular, we have the result, in compact form, 

exp (gCtr(x™)fe) ■x = Ad (exp (^-qk{xm)^'^^ © ones^ x, (3.7) 

giving an ^-action on matrices. The key feature is that elements of A act by similar- 
ity transformations, and that those similarity transformations involve powers of leading 
principal submatrices of x. Moreover, ^(a • x) = M{x) ("^ stabilizes the fibres M^(n)"). 
When X is sufficiently generic, the fibre is a single ^-orbit, that is, we have 

("-) = {a ■ X \ a G A} . 

The (2) parameters defining a e A, together with the initial choice of x, induce a set of 
coordinates along the fibre, namely the qj in (|3.6p . 

In general, in view of (|3.7p . the orbit is given explicitly in terms of certain subgroups 
of GL(n). To see which subgroups, note that the matrix exp((xm,)'^~^) buried on the 
right-hand side of (13.71) is invertible and is a polynomial in Xm, and that successively 

^^The stated formula is an application of [KWH Theorem 3.3]. Note the minus sign on the right side 
of pTll . 
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applying (13.71) with various q and k, but keeping m fixed, results in Ad(exp(p(xm)) © 
ones)(a;), where p{xm) can be any polynomial in x^- Define 

Gx,m = { 9 £ GL(m) I 51 is a polynomial in } . 

It will be convenient to consider Gx^m ^ GL(m) as a subgroup of GL(n) via the embed- 
ding g I— > diag(5(, ones). 

Theorem 3.8 ( |KWlj . Theorem 3.7). The orbit ^ • x of (an arbitrary, not necessarily 
generic) matrix x is the image of the mapping 

Gx,i X Gx,2 X • • • X Gx,n~i M{n) 
(5(1), . . . , <?(n - 1)) ^ Ad(5(l)) • • • Ad(5(n - 1)) (x). 

This means that a general element in the A-orbit is obtained by performing a series 
of similarity transformations of a particular kind. Therefore, to describe M^(n), we need 
to understand how it decomposes into ^-orbits, and, to describe an ^-orbit, we need to 
understand the kernel of (|3.9p . 

These considerations lead to another characterization of strong regularity. The con- 
dition is that dim A ' ^ ~ (2) (^^^ maximum possible). 

Theorem 3.10. [ |KW1| . Theorem 3.14] Let x be strongly regular. Then the map 

Gx,l X • • • X Gx,n-1 — ^ A ■ X 

is an algebraic isomorphism, so 

A ■ X = Gx^l X • • • X Gx,n—li 

where Gx,m is the centralizer of Xm in GL(m). 

This reduces the description of the orbit of any strongly regular matrix to the descrip- 
tion of the groups Gx,m, which can be done explicitly for any matrix. The only issue left 
is in picking a set of coordinates for Gx,m that are somehow natural. (The group A was 
originally defined starting from particular symmetric functions of the Ritz values, but, for 
generic matrices, it will be more convenient to use instead the Ritz values themselves.) 

Example 3.11. Let x^ be a regular semi-simple matrix, and suppose Xm = Qm^mg^ , 

where is diagonal. Then its centralizer is Gx^m = {gmDg^ \ D is diagonal with non-zero entries } = 

(C^)™". The parameters are the diagonal entries. 

Example 3.12. Let Xm be any regular matrix, and suppose Xm = dmJmgm^, where Jm is 
in Jordan canonical form. Then Gx,m = {gmD' g^^}, where D' is a block-diagonal matrix 
whose blocks are invertible triangular Toeplitz matrices, one for each Jordan block. If 
the Jordan blocks are of sizes rui with mi + ■ ■ ■ + rrit = m, then Gx,m — (C^)* x C™~*. 

In integrable systems language, our Hamiltonians are independent, but (/i, . . . , /^n+i-j) 
still has critical points — where the matrix is not strongly regular. Kostant and Wallach 
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give even more criteria for strong regularity, but we shall not need them. The point is 
that if X is not strongly regular, then describing A ■ x involves more than just calculating 
the groups Gx,m and applying Theorem 13.101 even if each Xm happens to be regular. 



Note. Even when x is strongly regular, if (G2^ 
0, then the description of M^(2.)( 



is violated, meaning E{xm)<^E{xm+i) / 



[n) is complicated by the fact that A does not act 
transitively on the (strongly regular part M^'^^^{n) of the) fibre, which breaks up into 
several (isomorphic) orbits. For example. 



/O .. 

1 

1 

V ■■ 



\ 
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and its transpose belong to distinct orbits. A point in M^^^\n) can be specified by 
indicating an element of A, the discrete data needed to specify a particular A-orbit, and 
a point in that orbit. Then, to define coordinates along the fibre, one needs to pick a 
representative of each orbit (note that there is a upper Hessenberg matrix in only one of 
the orbits); for example, the case ^ = is worked out in fPSlIC]. 

When X is generic, a more natural choice of functions would be the Ritz values rj{x) 
themselves. These are not globally defined functions, even restricted to the set of generic 
matrices (they are defined only on a covering). However, for any generic x, along with 
an ordering of each E(xm 

), it is possible to define vector fields r^j , 1 < j < (2) , such that 
the action of C on Maf i^^\^ (n) corresponding to the jth Ritz value is given by the action 
of exp(g?7j), g G C. What follows is the explicit expression of the associated similarities 
[see the proof of [KW2| . Theorem 5.5] 



Theorem 3.13. 



where 



exp{qrjj) ■ x = Ad(7j(e ^))(x), 



7i(e ") = diag(5„5,(e '')fi(^\ ones), 

where, for j = (™) +/, 1 < / < m, gm ^ GL(m) is any matrix such that Xm = Qm^mg^ 
and 5i{e~'^) is the m x m diagonal matrix 

diag(l,...,l,e-^l,...,l). 

I 

Next we put together the similarities associated to all the eigenvalues of a subma- 



Corollary 3.14. Let 1 < m < n — 1 and 

aim) = Yl 
(T)+i<J-<( 



e^piqjrjj), qj e C. 
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Then 



a(m) • X = Ad ^ (5m diag(e ''(")+',..., e '^("2"') ones^ j;, (3.15) 
where gm £ GL(m) is any matrix such that = gm^mg^ ■ 

Proof. Apply Theorem 13. 131 noting that the same gm works for each j = (™)+l, • ■ • , C"^^) ■ 

□ 

Remark 3.16. A generic fibre M^\n) is a single orbit, and the element exp(gir7i) • • • exp(g^n^r/^r, 

of A acts as the identity on the fibre if and only if each qj € 27riZ. (See |KW2| . The- 
orem 5.9.) Corollary 13.141 shows that the entries of the diagonal matrices D in Exam- 
ple 13.111 are, in fact, the coordinates e~''j dual to the Ritz values. The condition that 
the coordinates and Cm not vanish is filled automatically here by the exponentials. 
Geometrically, a generic fibre is an (2) -dimensional torus, because it is isomorphic to a 
product of (2) copies of the multiplicative group C^. 

Remark. The coordinates introduced in Example 13.121 are a direct generalization, but it 
would be interesting to check whether they satisfy some nice properties analogous to the 
generic case.^^ 

Finally, Kostant and Wallach define the coordinates Sj on a generic Ma^Xn) by pick- 
ing an initial point. Recall (Theorem I1.1.3P that M<^{n) contains a unique unit upper 
Hessenberg matrix y. Then Sj is defined by 

p-1i 



Sj [exp{qir]i) ■ ■ ■ exp{q^n-^r](ns^) ■ y 
(so Sj{y) = ones). 

3.1 Relation to arrow coordinates 

We now prove Claim [L3?3l that (sj) above are identical to the arrow coordinates {bj, . . . , 6^_i) 
defined in Section [L3l at the end of the matrix development. 

Claim. If 

X = a(l) • • • a(n — 1) ■ y 

with 

a(m) = Jl exp(-gjr/j), 

(-)+i<i<(-+^) 

then (fOril holds with b^, = {e . . . ,e ^ 2 )). 

^^In the generic case the "nice property" is that the diagonal entries in D in Example 13.111 are expo- 
nentials of angle coordinates. This says something about the symplectic geometry of generic matrices. 
Generalizing this invokes the geometry of certain less generic strata of the space of strongly regular ma- 
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Proof. Let gm € GL(m), l<m<n — 1, be the special matrices defined by applying 
the procedure described in Section 11.11 to the unit Hessenberg matrix y. We shall need 
the fact 

ym, = gm^mg^ (3.1.1) 

as well as the normalization 

em5m = ones, = (0, . . . , 0, 1), (3.1.2) 

i.e., the last row of Qm is ones. 

By Corollary EH x = Ad(c/(1)) • ■ ■ M[g{n - l))y with 

g(m) = gm d\a.g{bmY^gm ® ones G Gy^rn- (3.1.3) 
Observe that conjugation by an element of Gy,m leaves ym fixed, so 

(Ad(5(n - l))y)„_i = yn-i, 
(Ad(5(n - 2)) Ad(g(n - l))y)^_^ = M{g{n - 2)n-2)yn-2 = 2/n-2, 

etc., and we have 

Xm = Ad{g{l)m ■ ■ ■ g{rn - l)m)ym, (3.1.4) 
Xm+i = Ad{g{l)m+i ■ ■ ■ g{m)m+i)ym+i- (3.1.5) 

But (|3.1.ip and (13.1.41) imply that Xm = ZmAmZ~^ with 

Zm = g{l)m- ■ ■ g{rn - l)mgm, (3.1.6) 

whence 

(Z„^ e l)Xm+liZm © 1) = Ad((Z„^ © l)g(l)„+i • ■ ■ g{m)m+i)ym+i 

= Ad((5'~^ © l)g{m)m+i)ym+i, by substituting (|3.1.6p . 

= Ad{{g-^ © l){gm © l)(diag(6„)~^ © l){g-^ © l))y„+i, by ^ 

= (diag(6„^)"^ © l)ig~^ © l)ym+i{gm © l)(diag(6™) © 1). 

(3.1.7) 

Due to our normalization of gm, we have (suppressing irrelevant entries) 

9m^ 0\ / gm 0\ ^ ( 9m 0\ hm A f 9m ^\ ^ ( 9mym9m A ^ f Am * 



V ^'"^^ V ly V l){el *){0 elgm *J \ones * 

therefore (|3.1.7p becomes 

diag(6m)~^ 0\ / Am *\ / diag{bm) ^\ ^ f * 

Ij Vones *J\ ij \bl 



m 



□ 



trices, where the eigenvalues are allowed to coalesce. 
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3.2 Coordinates along a non-generic fibre 



As an illustration that the ideas of Section [3] can be generalized to describe any fibre, we 
consider a case studied by Mark Colarusso |(Cj. He suggests looking at the set of matrices 
satisfying 

• Xm. is regular, 1 < m < n. 



(G2„), 1 < m < n - 1. 



Any such matrix is strongly regular, and, moreover, the second condition further implies 
that Mj|'|^(n) is a single A-orbit. (This is the largest set of matrices that can be specified 
by naming a fibre and an element of ^.) 



The disadvantage of relaxing (Glm) is that there will no longer be a global set of 
complementary coordinates, because the geometry of the fibre will vary when the mul- 
tiplicity of an eigenvalue changes. A proper generalization of Kostant and Wallach's 
results in [KW2| would consider the geometry of the space of strongly regular matrices 



(satisfying (02^) , for simplicity) such that the generalized eigenvalues of Xm have given 
multiplicity. However, since we avoided the technical complication of constructing global 
coordinates by considering the fibres individually, we can allow ourselves to examine a 
single fibre in this slightly less generic case. 

The answer is given by Example 13.121 and the corresponding arrow coordinates may 
be computed as follows. Recall that any eigenvector of Xm has non-zero last entry. 

Claim 3.2.1. If the Jordan form of x^ is Jmiip-i) © • • • © JmAf^t), where each Jmiil^'i) is 
a lower Jordan block, then there exists G GL(m) such that Xm = 9mJm9m^ and the 
last row of Qm is 

( 0,0,0 , ..., 1 ,0,...,!,...). (3.2.2) 

mi m2 



Given m, let gm be as in Claim 13.2.11 Then the arrow coordinates are given by 
the first m entries of the bottom row of diag(g'~^, l)xm-i-i diag((7m, 1)- The coordinates 
of a unit upper Hessenberg matrix are (13.2.21) . and this coincides with the previous 
construction in case x is a generic matrix. 



References 

[A] V. I. Arnol'd. Mathematical Methods of Classical Mechanics, volume 60 of 
Graduate Texts in Mathematics. Springer- Verlag, New York, 199? Translated 
from the 1974 Russian original by K. Vogtmann and A. Weinstein, Corrected 
reprint of the second (1989) edition. 

[BBT] Olivier Babelon, Denis Bernard, and Michel Talon. Introduction to Classical In- 
tegrable Systems. Cambridge Monographs on Mathematical Physics. Cambridge 
University Press, Cambridge, 2003. 



26 



[BP] Roger Bielawski and Victor Pidstrygach. Gelfand-Zeitlin actions and rational 
maps. arXiv:math.SG/0612365. 

[C] Mark Colarusso. The Gelfand-Zeitlin Algebra and Polarizations of Regular 
Adjoint Orbits for Classical Groups. PhD thesis, University of California, San 
Diego, 2007. 

[GS] V. Guillemin and S. Sternberg. The Gel'fand-Cetlin system and quantization of 
the complex flag manifolds. J. Funct. Anal., 52(1):106-128, 1983. 

[KWl] Bertram Kostant and Nolan W'allach. Gelfand-Zeitlin theory from the perspec- 
tive of classical mechanics. I. In Studies in Lie theory, volume 243 of Progr. 
Math., pages 319-364. Birkhauser Boston, Boston, MA, 2006. 

[KW2] Bertram Kostant and Nolan Wallach. Gelfand-Zeitlin theory from the perspec- 
tive of classical mechanics. II. In The unity of mathematics, volume 244 of Progr. 
Math., pages 387-420. Birkhauser Boston, Boston, MA, 2006. 

[Li] J. Liouville. Sur I'integration des equations differentielles de la Dynamique. 
Journal de Mathematiques Pures et Appliquees, XX, Mai 1855. 

[LMV] Camille Laurent, Eva Miranda, and Pol Vanhaecke. Action-angle coordinates 
for integrable systems on Poisson manifolds. arXiv:0805.1679 [math.SG]. 

[Lo] Raphael Loewy. Principal minors and diagonal similarity of matrices. Linear 
Algebra Appl., 78:23-64, 1986. 

[P] Beresford N. Parlett. The Symmetric Eigenvalue Problem, volume 20 of Clas- 
sics in Applied Mathematics. Society for Industrial and Applied Mathematics 
(SIAM), Philadelphia, PA, 1998. Corrected reprint of the 1980 original. 

[PS] Beresford Parlett and Gilbert Strang. Matrices with prescribed Ritz values. 
Linear Algebra Appl., 428(7):1725-1739, 2008. 

[T] A. Thimm. Integrable geodesic flows on homogeneous spaces. Ergodic Theory 
and Dynamical Systems, 1:495-517, 1980. 



27 



